quantile fraction
Fully Parameterized Quantile Function for Distributional Reinforcement Learning
Derek Yang, Li Zhao, Zichuan Lin, Tao Qin, Jiang Bian, Tie-Yan Liu
Distributional Reinforcement Learning (RL) differs from traditional RL in that, rather than the expectation of total returns, it estimates distributions and has achieved state-of-the-art performance on Atari Games. The key challenge in practical distributional RL algorithms lies in how to parameterize estimated distributions so as to better approximate the true continuous distribution.
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Toward Risk-based Optimistic Exploration for Cooperative Multi-Agent Reinforcement Learning
Oh, Jihwan, Kim, Joonkee, Jeong, Minchan, Yun, Se-Young
The multi-agent setting is intricate and unpredictable since the behaviors of multiple agents influence one another. To address this environmental uncertainty, distributional reinforcement learning algorithms that incorporate uncertainty via distributional output have been integrated with multi-agent reinforcement learning (MARL) methods, achieving state-of-the-art performance. However, distributional MARL algorithms still rely on the traditional $\epsilon$-greedy, which does not take cooperative strategy into account. In this paper, we present a risk-based exploration that leads to collaboratively optimistic behavior by shifting the sampling region of distribution. Initially, we take expectations from the upper quantiles of state-action values for exploration, which are optimistic actions, and gradually shift the sampling region of quantiles to the full distribution for exploitation. By ensuring that each agent is exposed to the same level of risk, we can force them to take cooperatively optimistic actions. Our method shows remarkable performance in multi-agent settings requiring cooperative exploration based on quantile regression appropriately controlling the level of risk.
- Leisure & Entertainment > Games > Computer Games (0.46)
- Energy > Oil & Gas > Upstream (0.46)
Multi-compartment Neuron and Population Encoding improved Spiking Neural Network for Deep Distributional Reinforcement Learning
Sun, Yinqian, Zeng, Yi, Zhao, Feifei, Zhao, Zhuoya
Inspired by the information processing with binary spikes in the brain, the spiking neural networks (SNNs) exhibit significant low energy consumption and are more suitable for incorporating multi-scale biological characteristics. Spiking Neurons, as the basic information processing unit of SNNs, are often simplified in most SNNs which only consider LIF point neuron and do not take into account the multi-compartmental structural properties of biological neurons. This limits the computational and learning capabilities of SNNs. In this paper, we proposed a brain-inspired SNN-based deep distributional reinforcement learning algorithm with combination of bio-inspired multi-compartment neuron (MCN) model and population coding method. The proposed multi-compartment neuron built the structure and function of apical dendritic, basal dendritic, and somatic computing compartments to achieve the computational power close to that of biological neurons. Besides, we present an implicit fractional embedding method based on spiking neuron population encoding. We tested our model on Atari games, and the experiment results show that the performance of our model surpasses the vanilla ANN-based FQF model and ANN-SNN conversion method based Spiking-FQF models. The ablation experiments show that the proposed multi-compartment neural model and quantile fraction implicit population spike representation play an important role in realizing SNN-based deep distributional reinforcement learning.
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Idaho > Ada County > Boise (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Health & Medicine > Therapeutic Area > Neurology (0.93)
- Leisure & Entertainment > Games > Computer Games (0.88)
DSAC: Distributional Soft Actor Critic for Risk-Sensitive Reinforcement Learning
Ma, Xiaoteng, Xia, Li, Zhou, Zhengyuan, Yang, Jun, Zhao, Qianchuan
In this paper, we present a new reinforcement learning (RL) algorithm called Distributional Soft Actor Critic (DSAC), which exploits the distributional information of accumulated rewards to achieve better performance. Seamlessly integrating SAC (which uses entropy to encourage exploration) with a principled distributional view of the underlying objective, DSAC takes into consideration the randomness in both action and rewards, and beats the state-of-the-art baselines in several continuous control benchmarks. Moreover, with the distributional information of rewards, we propose a unified framework for risk-sensitive learning, one that goes beyond maximizing only expected accumulated rewards. Under this framework we discuss three specific risk-related metrics: percentile, mean-variance and distorted expectation. Our extensive experiments demonstrate that with distribution modeling in RL, the agent performs better for both risk-averse and risk-seeking control tasks.
- North America > United States > New York (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
Fully Parameterized Quantile Function for Distributional Reinforcement Learning
Yang, Derek, Zhao, Li, Lin, Zichuan, Qin, Tao, Bian, Jiang, Liu, Tieyan
Distributional Reinforcement Learning (RL) differs from traditional RL in that, rather than the expectation of total returns, it estimates distributions and has achieved state-of-the-art performance on Atari Games. The key challenge in practical distributional RL algorithms lies in how to parameterize estimated distributions so as to better approximate the true continuous distribution. Existing distributional RL algorithms parameterize either the probability side or the return value side of the distribution function, leaving the other side uniformly fixed as in C51, QR-DQN or randomly sampled as in IQN. In this paper, we propose fully parameterized quantile function that parameterizes both the quantile fraction axis (i.e., the x-axis) and the value axis (i.e., y-axis) for distributional RL. Our algorithm contains a fraction proposal network that generates a discrete set of quantile fractions and a quantile value network that gives corresponding quantile values. The two networks are jointly trained to find the best approximation of the true distribution. Experiments on 55 Atari Games show that our algorithm significantly outperforms existing distributional RL algorithms and creates a new record for the Atari Learning Environment for non-distributed agents.
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Canada (0.04)